A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2025-05-19, 17:47 NZST
based on data in:
/scale_wlg_nobackup/filesets/nobackup/uoa03387/AG1491_Rory/all_results/Example_bacterial_genome/results
General Statistics
| Sample Name | ≥ 30X | Median | Mean Cov. | N50 (Kbp) | Assembly Length (Mbp) | Organism | Contigs | CDS | Reads mapped | % Streptomyces | % Top 5 Genus | % Escherichia coli | % Top 5 Species | % Unclassified |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| consensus | 5.0% | 20X | 20.1X | 8571.5Kbp | 8.6Mbp | |||||||||
| consensus.stat | 0.1M | |||||||||||||
| kraken2_report_assembly | 100.0% | 100.0% | ||||||||||||
| kraken2_report_unmapped | 89.9% | 92.9% | 2.6% | |||||||||||
| unknown | Streptomyces unknown | 1 | 7690 |
Mosdepth
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.URL: https://github.com/brentp/mosdepthDOI: 10.1093/bioinformatics/btx699
Cumulative coverage distribution
Proportion of bases in the reference genome with, at least, a given depth of coverage. Calculated across the entire genome length
For a set of DNA or RNA reads mapped to a reference sequence, such as a genome or transcriptome, the depth of coverage at a given base position is the number of high-quality reads that map to the reference at that position, while the breadth of coverage is the fraction of the reference sequence to which reads have been mapped with at least a given depth of coverage (Sims et al. 2014).
Defining coverage breadth in terms of coverage depth is useful, because sequencing experiments typically require a specific minimum depth of coverage over the region of interest (Sims et al. 2014), so the extent of the reference sequence that is amenable to analysis is constrained to lie within regions that have sufficient depth. With inadequate sequencing breadth, it can be difficult to distinguish the absence of a biological feature (such as a gene) from a lack of data (Green 2007).
For increasing coverage depths (1×, 2×, …, N×), coverage breadth is calculated as the percentage of the reference sequence that is covered by at least that number of reads, then plots coverage breadth (y-axis) against coverage depth (x-axis). This plot shows the relationship between sequencing depth and breadth for each read dataset, which can be used to gauge, for example, the likely effect of a minimum depth filter on the fraction of a genome available for analysis.
Average coverage per contig
Average coverage per contig or chromosome
QUAST
Quality assessment tool for genome assemblies.URL: http://quast.bioinf.spbau.ruDOI: 10.1093/bioinformatics/btt086
Assembly Statistics
| Sample Name | N50 (Kbp) | L50 (K) | Largest contig (Kbp) | Length (Mbp) |
|---|---|---|---|---|
| consensus | 8571.5Kbp | 0.0K | 8571.5Kbp | 8.6Mbp |
Number of Contigs
This plot shows the number of contigs found for each assembly, broken down by length.
Prokka
Rapid annotation of prokaryotic genomes.URL: http://www.vicbioinformatics.com/software.prokka.shtmlDOI: 10.1093/bioinformatics/btu153
This barplot shows the distribution of different types of features found in each contig.
Prokka can detect different features:
- CDS
- rRNA
- tmRNA
- tRNA
- miscRNA
- signal peptides
- CRISPR arrays
This barplot shows you the distribution of these different types of features found in each contig.
NanoStat
Reports various statistics for long read dataset in FASTQ, BAM, or albacore sequencing summary format (supports NanoPack; NanoPlot, NanoComp).URL: https://github.com/wdecoster/nanostat; https://github.com/wdecoster/nanoplotDOI: 10.1093/bioinformatics/bty149
Programs are part of the NanoPack family for summarising results of sequencing on Oxford Nanopore methods (MinION, PromethION etc.)Summary Statistics
| Sample Name | Median length | Read N50 | Median Qual | # Reads (K) | Total Bases (Mb) |
|---|---|---|---|---|---|
| NanoStats | 2451bp | 5233bp | 20.6 | 44.4K | 165.7Mb |
Reads by quality
Read counts categorised by read quality (Phred score).
Sequencing machines assign each generated read a quality score using the Phred scale. The phred score represents the liklelyhood that a given read contains errors. High quality reads have a high score.
Samtools
Toolkit for interacting with BAM/CRAM files.URL: http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352
Flagstat
This module parses the output from samtools flagstat
Bracken
Computes the abundance of species in DNA sequences from a metagenomics sample.URL: https://ccb.jhu.edu/software/brackenDOI: 10.7717/peerj-cs.104
Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.
Kraken
Taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.URL: https://ccb.jhu.edu/software/krakenDOI: 10.1186/gb-2014-15-3-r46
Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.